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(57) Abstract 

A plurality of processing elements (1 1, 13, 15, 17) inde- 
pendently operate in parallel on separate streams of data but 
in response to common instruction. In order to selectively 
and individually enable each processing element, a control 
register stage (37) is provided for eacL Each register may be 
controlled, as between its enabling and disabling states with 
respect to execution of a common instruction, by the results 
of a test performed by its associated processor in response to 
a prior instruction and by the complement of the test results. 
The system is especially adapted to support flow of control 
operators, such as IF/THEN constructs, IF/THEN/ELSE 
constructs and WHILE/DO loop constructs. 
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SELECTIVE OPERATION OF PROCESSING 
ELEMENTS IN A SINGLE INSTRUCTION, 
MULTIPLE DATA STREAM {SIMP) COMPUTER SYSTEM 



Background of the Invention 

This invention relates generally to parallel 
data processing techniques and computer systems, and 
specifically to those of a type where each of a plurality 
of parallel processors simultaneously executes the same 
instruction on different data. Such a computer is 
commonly termed a single instruction, multiple data 
stream (SIMD) processor. 

There are many data processing applications 
wherein multiple streams of data may be processed in the 
same manner. An example is in the field of computer 
graphics where separate video red, green, blue and alpha 
digital signals may be processed identically. To achieve 
the highest processing rate, it is thus convenient to 
process these four data streams simultaneously with the 
same sequence of instructions. That is, at any given 
instant, separate red, green, blue and alpha data for a 
particular color display pixel are being simultaneously 
processed. 

Parallel processing is particularly fast if the 
program being executed on the parallel streams of data is 



wo 87/00318 



PCT/US86/01354 



2 

an invariant series of statements. It is more commonr 
however, that the controlling program includes condi- 
tional statements that depend for execution upon the data 
in each of the parallel processors. Since the data being 
5 processed in each stream will be different, provision 
must be made in this case for those processors whose data 
does icxot meet the condition of the program statement to be 
rendered non-operative during the time that the remaining 
processors are executing the particular statement. It is 

10 known that a WHILE-DO construct is the minimum needed to 
implement all possible flow control structures. 

A common example of such a conditional program 
instruction is an "IP-THEN" statement: that is, the 
individual processors are all instructed to perform a 

15 certain manipulation of their individual data streams, 
but only "if their data meets a certain condition 
expressed in the program instruction. Those processors 
whose data at that instant do not meet the condition do 
not execute that instruction. An "IF-THEN" instruction 

20 is often augmented by an "ELSE" modifier; that is, those 
processors not executing the "IP-THEN" statement are 
subsequently instructed to execute a different operation 
on their data at the next instant while those processors 
who did execute the "IF-THEN" instruction are rendered 

25 inoperative. 

It is a general object of the present invention 
to provide improved techniques and circuits for selec- 
tively controlling which of a plurality of parallel 
processors execute specific conditional instructions. 

30 Summary of the Invention 

This and additional objects are accomplished by 
the present invention, wherein, briefly, each of the 
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parallel processors has a separate control element, such 
as one bit of a control register, that enables the 
processor to execute a common, instruction given all 
processors when the element is in one state and disables 
5 the processor from executing that instruction when in its 
other state* The state of each control element is set to 
control execution of a particular statement dependent 
upon whether the data for that processor met the test of a 
previous instruction, such as an "IF-THEN" instruction. 

10 In subsequent complementary execution, such as occurs in 
an "ELSE" instruction, the states of the control elements 
are reversed so that those processors who did not execute 
the first statement will execute the subsequent state- 
ment, and vice versa. 

15 In addition, in order to provide a capability 

for nested excecution of such complementary types of 
instructions, a v memory device (a stack memory in a 
preferred embodiment) is provided to store the states of 
the individual control elements when the nested condi- 

20 tional statement occurs. When execution of the nested 
instruction is completed, the states of the control 
elements at the time of the nesting conditional statement 
are restored so that the processing of them may continue. 

Additional objects, features and advantages of 

25 the various aspects of the present invention will best be 
appreciated from a description of its preferred embodi- 
ments, which description should be taken in conjunction 
with the accompanying drawings. 

Brief Description of the Drawings 
30 Figure 1 illustrates in general block diagram 

form a SIMD processor; 

Figure 2 illustrates a first circuit embodiment 
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of the control circuits of the system of Figure 1; 

Figures 3 and 4 are tables which illustrate the 
operation of the system of Figure 1 when implemented with 
the control circuit of Figure 2; 
5 Figure 5 illustrates a second circuit embodi- 

ment of the control circuits of the system of Figure 1; 

Figures 6 and 7 are tables which illustrate the 
operation of the system of Figure 1 whem implemented with 
the control circuit of Figure 5; and 
10 Figure 8 provides logic details of another 

portion of the circuit of Figure 1. 

Description of the Preferred Embodiments 

Referring to Figure 1, the overall architecture 
of a computer system utilizing the various aspects of the 

15 present invention will be described. Separate pro- 
cessors 11, 13, 15 and 17 receive, respectively, inde- 
pendent data streams in input lines 19, 21, 23, and 25. 
Similarly, independent lines 27, 29, 31, and 33 carry, 
respectively, the outputs of the processing elements. 

20 Four parallel data processors are illustrated in this 
example, but it will be understood that the principles of 
the present invention apply to a parallel system contain- 
ing arbitrarily many parallel processing elements. Four 
processors are conveniently used in a graphics computer 

25 system, one channel used to process data of the red 
component of a video signal, another for the green 
component, a third for the blue, and a fourth for an alpha 
component that provides other information of the image. 
Parallel processing is particularly adapted for a gra- 

30 phics application since high speed processing is a 
requirement and the same sequence of program instructions 
is executed simultaneously on all four data paths. 
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There are certain program instructions, how- 
ever , that require one or more processing elements to not 
participate in executing a particular program instruction 
that is applied simultaneously through an instruction bus 
5 35 to all four of the processing elements 11 , 13, 15 and 
17. In order to control which of the four processing 
elements are active to execute a particular instruction, 
a control circuit is provided in association with each of 
them, such as a circuit 37 which controls operation of the 

10 processing element 11, A line 39 carries a signal to the 
processing element 11 which controls whether it is 
enabled to execute an instruction on the bus 35. For 
example, a voltage in line 39 representative of a logical 
will cause the processing element to execute the 

15 instruction, while a voltage representative of a logical 
"0" will disable the processing element during execution 
of that particular instruction by other of the processing 
elements. 

Each of the four control circuits of the system 
20 of Figure 1, such as the circuit 37, determines whether to 
enable its associated processing element, such as pro-- 
cesser 11, on the basis of several pieces of information. 
One is an initial condition which is presented external of 
the circuits of Figure 1 in a set line 41. Another piece 
25 of information is a status instruction in a bus 43 which 
specifies, for those processor instructions on bus 35 
that may require less than all of the processing elements 
to execute the instruction, additional instructions for 
determining the state of the enable signal in the line 39. 
30 A final piece of information is a true "1" or false "0" 
signal in a line 45 which gives the result of a test 
performed by the processing element 11 on its data in 
response to a current or immediately preceding instruct 
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tion on the bus 35. Each of the four control circuits 
shown in Figure 1 operates similarly, except that the test 
result input received from its associated processing 
element can be different and thus result in some 
5 processors being enabled and others being disabled at a 
given instant in time. 

The function of the control circuits in the 
system of Figure 1 is explained more fully with respect to 
its two preferred embodiments, one embodiment illustrated 

10 in Figures 2-4 and another in Figures 5-7. But before 
proceeding to those embodiments, some general items of 
the system of Figure 1 are first explained. The pro- 
cessor instructions in the bus 35 and the status 
instructions 43 originate from a micro-programmed control 

15 unit such as micro-sequencer 47. A micro-programmed 
control unit consists of the micro-program memory and the 
structure required to determine the address of the next 
microinstruction, specific implementations being well 
known. 

20 A logic circuit 49 has as inputs the invidiual 

test result lines of each of the processing elements. 
The logic circuit 49 generates a condition code in an 
output line 51 when the signals in the input test result 
lines are a particular one or more combinations. The 

25 signal in the line 51 is connected to the condition code 
input of the micro-sequencer 47, thus enabling a change in 
the sequence of instructions in response to a particular 
combination of test result outputs. Another input to the 
logic circuits 49 is by way of a line 53, an instruction 

30 field of the micro-sequencer 47. 

In a particular implementation of the system of 
Figure 1 for color computer graphics processing, each of 
the processing elements contains as primary components a 
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16-bit multiplier and a 16-bit arithmetic and logic unit 
(ALD). Extremely fast processing is desired in computer 
graphics applications because of the large number of 
pixels in each frame of a picture, each pixel being 
5 defined by four 16-bit words. 

Referring to Figure 2, a circuit is shown that 
is suitable for use, according to one embodiment, as each 
of the control circuits shown in Figure 1, such as the 
circuit 37* A flip-flop circuit 61 has its output 

10 connected to the enable line 39. An input line 63 is 
connected to an output of a four-position multiplexer 65. 
The multiplexer 65 has four separate inputs 0-3. The 
status instruction in the bus 43 selects which of the 
inputs 0-3 is connected to the output 63. The 0 input of 

15 the multiplexer is connected directly to the output of the 
flip-flop 61, thereby allowing the current state of the 
flip-flop 61 to be held when the multiplexer 65 is 
switched to its 0 input. Conversely, when switched to 
its number 3 input, the state of the flip-flop 61 is 

20 changed since its output is connected through an inverter 
67 back to its input. The number 1 and number 2 position 
input positions of the multiplexer 65 are the test result 
line 45 and the set line 41, respectively, previously 
discussed with respect to Figure 1. 

25 The specific circuit examples being described 

are particularly adapted for executing IP-THEN-ELSE 
program instructions. The table of Figure 3 summarizes 
the four possible states of the control circuit of Figure 
2, depending upon the status instruction on the bus 43. 

30 When the multiplexer 65 is switched to its 0 input, the 
output in the line 39 is held, the condition desired when 
the logical operation commanded by the instruction on the 
bus 35 of Figure 1 is to execute a statement. The next 
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status instruction, selecting the 1 input of the multi- 
plexer 65, causes the test result of its associated 
processing element to be stored, as previously described, 
an operation that accompanies an IF instruction in the bus 
5 35. The status instruction 2 causes the flip-flop 61 to 
be set, a status instruction on bus 43 that accompanies an 
END IF instruction in the processing element instruction 
bus 35. Lastly, a status instruction 3 causes the flip- 
flop element 61 to change state in order to enable those 
10 processors previously disabled, and conversely to disable 
those processors previously enabled* The status in- 
struction 3 is presented in the bus 43 simultaneously with 
the ELSE instruction in the bus 35. Micro-code in the 
micro-sequencer 47 assures that the instructions in the 
15 buses 35 and 43 correspond according to the table of 
Figure 3 in accordance with other particular requirements 
of any application. 

The table of Figure 4 better explains the 
operation of the circuit of Figure 1, when using a control 
20 circuit of Figure 2, by a specific example. Consider the 
example of an IP statement asking whether the data input 
to each processing element .(D^) is greater than 1. As 
shown in line 2 of the table of Figure 4, it is assumed in 
the "test result" column that the first and third 
25 processing elements have passed the test, thus showing 
the logical "1" in their test result output lines 45, 
while the second and fourth processors have failed the 
test, and thus show a test result logical signal of "0". 
Even though each processor is executing the same IF 
instruction, the results of the test performed by each can 
be different because the data being processed by each is 
generally different. 

At the same time the IF instruction is being 
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executed, the status instruction on the bus 43 causes the 
multiplexer 65 of each of the control circuits of the 
system of Figure 1 to switch to its position 1 to receive 
the test results from their corresponding processors. 
5 These test results, whether a test pass "l" or fail "0", 
are then stored in the individual flip-flop elements. 
The enable signal outputs of the four flip-flops are given 
as the enable signals in the table of Figure 4, referred 
to interchangably in this example as "run flags". At 

10 line 2 of the table of Figure 4, the runflags are causing 
those processing elements who pass the test to be enabled 
and those who did not to be disabled. Those which are 
enabled are then caused, as shown in the line 3 of the 
table of Figure 4, to execute a statement, in this example 

15 chosen to be to set the data output (D^) equal to 1 of the 
enabled processing elements. The disabled processing 
elements do nothing at this time. 

An ELSE instruction is next presented to all 
the processing elements for execution, which is to say 

20 that those processors who failed the IF test are now going 
to be called upon to do something different, as illus- 
trated in lines 4 and 5 of the. table of Figure 4. The ELSE 
processor instruction is accompanied by the status 
instruction 3 which causes the control circuits, illus- 

25 trated in Figure 2, to all invert the states of their 
flip-flops. That can be seen by comparing the run flags 
of lines 3 and 4 of Figure 4, one being the complement of 
the other. Once the processors previously disabled are 
enabled, a statement is executed, as shown in line 5 of 

30 Figure 4, wherein in this example the output data value is 
set equal to the input data value. The result of the 
routine illustrated in Figure 4 is thus to set the value 
of the data output lines 27 and 31 equal to 1, and output 
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lines 29 and 33 equal to the value of the corresponding 
data input* Complementary operation of the processors to 
execute the IF and ELSE instructions is made possible by a 
simple provision in each of the control circuits for 
inverting all of their states in response to a single 
status instruction. 

The logic circuits 49 of Figure 1 are useful for 
detecting conditions wherer because of a particular 
combination of input data^ certain instructions need not 
be executed. In such a case, the micro-sequencer 47 is 
then caused to skip the unexecutable instructions. Logic 
circuits 49 may be omitted in implementations where 
unexecuted instruction sequences may be allowed to occur. 
In the example of Figure 4, if the test results shown in 
line 2 had all been 0, then there is no need to execute the 
statement of line 3 since all processors would be 
disabled. For this particular example, therefore, the 
logic circuits 49 are designed to detect when all 
processor test results are false (0) and causes the 
condition code in the line 51 to change, with the 
resultant change of the instruction sequence issued by 
the micro-sequencer 47. Additionally, if the test re-- 
suits are all true (1), then the instructions at lines 4 
and 5 of Figure 4 do not need to be executed, so the 
condition code in the line 51 can cause that instruction 
sequence to be bypassed, as well. A signal in line 53 
functions to allow testing for any false (0) condition or 
any true (1) condition. Thus, the ability is provided 
(in conjunction with the status instruction on the bus 43) 
for testing for any or all conditions true or false. 

An example of specific logic for carrying out 
these functions is given in Figure 8. An OR gate 52 has 
as its inputs the test result lines from all of the 
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processing elements. The gate's output is one input of 
an exclusive OR gate 54, the select line S3 being the 
second input. The output of the gate 54 is the condition 
code line 51. The gate 54 operates to pass through the 
output of the gate 52 when the select line 53 is false (0), 
and to pass a complement of that output when the line 53 is 
true (1). 

Certain applications will require the ability 
of the individual processing element control circuits to 
handle a set of instructions that is nested within an IF- 
THEN-ELSE series of instructions. When this is required, 
the run flags determined as the result of executing the IP 
instruction are stored while the nested set of instruc- 
tions is being executed. Once the nested instructions 
have been executed, the stored run flags are called out of 
memory so that the remainder of the IF-THEN-ELSE set of 
instructions can be executed. 

The control circuit of Figure 5 allows such 
•nested program instruction operation. Added to the 
system circuit of Figure 1 is a stacked memory 81, and 
associated controlling decoder circuits 83. The cir- 
cuits within the dotted outline of Figure 5 are not 
repeated within each of the four control circuits of 
Figure 1, but rather are shared by them. The decoding 
circuits 83 respond to status instructions in the bus 43 
to cause the current enable signals (run flags) of each of 
the control circuits to be stored in the stack memory 81 
(a "push*) through lines 85 or to be read from memory (a 
"pop") through lines 87. As is well known, stack 
memories read ("pop") the last written ("pushed") data. 
And each time data is written when there already is data 
in the stack memory, the existing data is pushed to a 
lower level in a manner that it can be read out of the 
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memory only after the most recently written data is read 
out. In other words, data is read out in a £irst-in, 
last-out sequence. 

Returning to Figure 5, the circuitry of each of 
5 the four control circuits of Figure 1 is described for 
this embodiment. A flip-flop 91 of the same type used in 
the embodiment of Figure 2 is employed, with this output 
being the enable signal, one bit of the four-bit run flag. 
Its input in a line 93 is also connected to an output of a 

10 multiplexer 95. The multiplexer, however, has five 
positions 0-4, one more than used in the embodiment of 
Figure 2. One of these inputs is selected at a time for 
connection to the input of the flip-flop 91 by the status 
instruction in the bus 43. The 0 input is connected 

15 directly to the flip-flop output, thus serving to hold the 
flip-flop in whatever state it is found when switched to 
that position. Input 1 of the multiplexer receives the 
output of AND gate 97, having as one input the output of 
the flip-flop 91 and as the other input test result line 

20 45 of its associated processor. As indicated in the 
table of Figure 6, the status instruction 1 is also 
decoded by circuits 83 to store ("push") at the top of the 
stack memory 81 the output (run flags) of the flip-flops 
within the control circuits of Figure 1, 

25 Multiplexer input 2 is connected to the set 

line 41. Input number 3 is connected to the stack memory 
81 for setting the flip-flops in accordance with what has 
previously been recorded at the top of the stack. The 
decoding circuits 83 cause the top stack data of the 

30 memory 81 to pop when the status instruction 3 is 
received. 

The last input of the multiplexer 95, switched 
in response to a status instruction number 4, receives the 
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output of another AND gate 99 whose two inputs are 
connected to the stack memory output and the output of the 
flip-flop 91 through an inverter 101. The result is to 
AND together the data stored at the top of the stack and a 
complement of the current run flags. 

The control circuit of Figure 5, whose logical 
operation is shown in the table of Figure 6, is especially 
adapted for carrying out the sequence of operations given 
in Figure 7. In that sequence, an IF-THEN-ELSE sequence 
of program instructions is executed at lines 1, 2, 3, 9, 
10, 11, 17, 18, and 19. Nested inside the IF or ELSE 
portions of that set of instructions is yet another IF- 
THEN-ELSE series of instructions, at lines 4-8. 
Similarly, a second set of such statements is nested at 
lines 12-16 within the basic sequence of instructions. 
In each of the three IF-THEN-ELSE series of instructions, 
a different test result is assumed, as shown in the "test 
result" column of Figure 7. These different test results 
cause different run flags for each of the three IF-THEN- 
ELSE series of instructions. The dotted arrows show the 
flow of run flag bits in the course of the operation of the 
stack memory 81, those arrows pointing generally to the 
right being the result of a push operation and those 
generally to the left the result of a pop operation. 

Although the various aspects of the present 
invention have been described with respect to its 
preferred embodiments, it will be understood that this 
invention is entitled to protection within the full scope 
of the appended claims. 
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IT IS CLAIMED; 

1. In a computer system having a plurality of 
processors that each operate in a different data path, all 
of said processors connected to receive a conunon instruc- 

5 tion at any instant of time, a system for executing at 
least first and second instructions in time sequence 
wherein the first instruction causes the data in said 
paths to be tested against a common condition and the 
second instruction is executed only in those paths where 
10 the test of the first instruction has provided a certain 
pre-defined result, comprising: 

a register associated with each of said pro- 
cessors in a manner that the processor is enabled to 
execute an instruction when the register, is in a first 
15 state and disabled when in a second state, 

means connecting each processor with its asso- 
ciated register for forcing said register into its said 
second state when the test of the first instruction gives 
said pre-defined result and into said first state when 
20 said test does not give said pre-defined result, and 

means responsive to said second instruction for 
changing the state of said registers prior to any of said 
processors executing said second instruction, whereby the 
processors executing the second instruction are those 
25 where the condition test of the first instruction was said 
certain pre-defined result and the other processors are 
disabled during the second instruction. 

2. The computer system according to claim 1 
wherein said first instruction includes an IF instruc- 

30 tion, and said second instruction includes an ELSE 
-instruction. 
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3. The computer system according to claim 1 
that additionally comprises a memory means connected to 
said registers for temporarily storing and replacing the 
contents of said registers, whereby other instructions 
5 may be nested between execution of said first and second 
instructions without losing the results of the test of the 
first instruction for use when execution of the second 
instruction resumes. 
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